feat(arrow-avro): accept default value of null for Avro union with null type in any branch position [avro 1.12]#9487
Open
mzabaluev wants to merge 5 commits intoapache:mainfrom
Conversation
Test the Avro 1.12 spec behavior of resolving default values in the specific case when the default value for the field added in the reader schema is null, and null the second branch in the field's union type.
This comment was marked as outdated.
This comment was marked as outdated.
Avro 1.12, new rules.
Introduce the "avro_1_12" feature flag and use it to guard the behavior of JSON null defaults for union types having null schema in a position other than the first.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
The Avro specification version 1.12 extends acceptance of default values for unions to match any schema branch in the union rather than the first.
This change implements the new behavior in the specific case of the default value being null, which is important for some real-world cases of Iceberg schema evolution. Spark converts nullable fields in its SQL schema to Avro field types with the null variant listed last. When a column is added to an iceberg table backed by Avro files, the default value of its field in the reader schema shall be specified as null.
What changes are included in this PR?
Introduce the "avro_1_12" feature as requested by #8703.
When this feature is enabled, change the validation of a null default value for union and nullable types to allow null in any branch (for unions treated as Arrow unions) and nullability order (for unions treated as nullable types).
Are these changes tested?
Added a test gated by the newly introduced "avro_1_12" feature to exercise the
["int", "null"]type with the default of null.Are there any user-facing changes?
This is a behavioral change where more schema resolution cases become accepted than were permitted by the Avro 1.11 spec. Despite the feature-gating, there may be unexpected effects as explained in #8703 (comment)